Illiberal Communication and Election Intervention During the Refugee Crisis in Germany


Ashrakat Elshehawy, Konstantin Gavras, Nikolay Marinov, Federico Nanni, Harald Schoen


Perspectives on Politics


Readme file


Dataverse link:


"Replication Data for: Illiberal Communication and Election Intervention During the Refugee Crisis in Germany", https://doi.org/10.7910/DVN/T2FZK3, Harvard Dataverse, DRAFT VERSION, UNF:6:L4g980UvlhsPseyzqCxmKw== [fileUNF]


We have two types of data files:


1 Refugee Relevant Media pieces


(a) rt_relevant-migrant-news.csv
(b) taz_relevant-migrant-news.csv
(c) sueddeutsche_relevant-migrant-news.csv
(d) welt_relevant-migrant-news.csv
(e) bild_relevant-migrant-news.csv
(f)  faz_relevant-migrant-news.csv
(g) sputnik_relevant-migrant-news.csv


2 Refugee Relevant Party Communication


(a) cdu_refugeerelevant.tsv
(b) fdp_refugeerelevant.csv
(c) greens_refugeerelevant.csv
(d) afd_refugeerelevant.csv
(e) linke_refugeerelevant.csv
(f) spd_refugeerelevant.csv


Both are available as text files in the dataverse. 



We use the Python Script Sentiment Analysis.ipnyb to create a sentiment score for each media piece and party communication piece that is refugee-relevant.


Input: 
1. Refugee Relevant Media pieces, 
2. Refugee Relevant Party Communication, 
3. SentiWS_v1.8c_Negative.txt, 
4. SentiWS_v1.8c_Positive.txt


Output:
1. bild_sentiment.csv
2. faz_sentiment.csv
3. rt_sentiment.csv
4. sputnik_sentiment.csv
5. sueddeutsche_sentiment.csv
6. taz_sentiment.csv
7. welt_sentiment.csv
8. cdu_sentiment.csv
9. spd_sentiment.csv
10. afd_sentiment.csv
11. fdp_sentiment.csv
12. linke_sentiment.csv
13. greens_sentiment.csv



We visualize the sentiment scores using the R script ‘Script_sentimentAnalysis.R’, which creates the figures 1, 2, 6 and 7. It takes the estimated sentiment scores of the media and party communication data that are refugee relevant as input (input in the step above).



We have a Python script determining the conspiratorial score of the media pieces.


Python Script: Topic_Classify.ipnyb 


Takes in Refugee Relevant Media pieces listed above, and generates two new dataframes that include conspiracy scores, each of the dataframes is produced using a different word-embedding file. 


Input:


(a)  rt_relevant-migrant-news.csv
(b)  taz_relevant-migrant-news.csv
(c)  sueddeutsche_relevant-migrant-news.csv
(d)  welt_relevant-migrant-news.csv
(e)  bild_relevant-migrant-news.csv
(f)  faz_relevant-migrant-news.csv
(g)  sputnik_relevant-migrant-news.csv


Other needed input:

- Word-embeddings: 

1) in-domain embedding.txt 
2) wiki.de.vec

- A dictionary called topics-classification.txt ( includes the conspiracy keywords)


Output:

1) refugee-indomain-class-score.csv
2) refugee-wiki-class-score.csv



On the basis of the output of  the Python code above, we then have an R graph script producing Figures 3 and 10. 

R Code: Figure3and10.R

Input: refugee-indomain-class-score_v2.csv, refugee-wikifeb-class-score_v2.csv
Output: Figure 3 and Figure 10


Note: In our German data, specifically the FAZ media articles, there were around 292 articles
that heavily included java code in the text of the article without any real article text.
We, therefore, have dropped these cases before creating our plots.


We have a STATA data file based on the media pieces relevant to refugees.  


There is a do file that takes this data and saves the results to be shown on Figure 4, analyze.do.  There is a small separate do file for that graph, graph_es.do, using the data produced by the former do file, and saved as ar.csv


analyze.do at the end also creates the Figures 5 a and b - it takes the media data, and plots number of stories per month.


To recap that portion:


*        analyze.do  -> main do file for Fig 4, 5a, 5b


*        graph_es.do -> uses output ar.csv generated by way of manually putting together ar_RT.csv and ar_Sputnik into a single ar.csv (instructions in file) 


*        Required data files:


*        main_data2.tsv <-- basic data frame


*         asyl.dta  <-- imports EU mo asylum applicant data (for Germany)


*        files with monthly stories by outlet: bild_news_relevant-migrant-news.dta faz_news_relevant-migrant-news.dta rt_news_relevant-migrant-news.dta sputnik_relevant-migrant-news.dta sueddeutsche_relevant-migrant-news.dta taz_relevant-migrant-news.dta welt_relevant-migrant-news.dta



We have a Python Code that performs the Ablation Analysis; to understand how dropping a word from the conspiracy dictionary affects conspiracy scores. This data is to be used later in figure 11 in the Online Appendix.


Python Code: Keyword-Removal-Topic-Classifiy.ipnyb
Input:


(a)  rt_relevant-migrant-news.csv
(b)  taz_relevant-migrant-news.csv
(c)  sueddeutsche_relevant-migrant-news.csv
(d)  welt_relevant-migrant-news.csv
(e)  bild_relevant-migrant-news.csv
(f)  faz_relevant-migrant-news.csv
(g)  sputnik_relevant-migrant-news.csv


Other needed input:
- Word-embeddings:  wiki.de.vec
- A dictionary called topics-classification.txt - it includes the conspiracy keywords


Output:

PopulismConspiracyColRevolt-ablationdata.csv

Figure 11: Ablation Analysis


R Code: Figure11.R
Input: PopulismConspiracyColRevolt-ablationdata_v2.csv
Output: Figure 11 in the Online Appendix


Same note as above: In our German data, specifically the FAZ media articles, there were around 292 articles that heavily included java code in the text of the article without any real article text. We, therefore, have dropped these cases before creating our plots.

